Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove RWLock from EntryNotifier because it causes perf degradation #33797

Merged

Conversation

vovkman
Copy link
Contributor

@vovkman vovkman commented Oct 20, 2023

Problem

When upgrading to 1.16 we observed that some of our geyser nodes could not keep up with the chain and had wild CPU fluctuations (running on 7443p). The first solution we found was to increase CPU (7543p), which sort of solved the issue. We profiled a local cluster to find any hotspots and found that the EntryNotifier was using significantly more CPU time than any other Geyser method. It seemed to be due to some thread contention from recv_timeout or the write lock is was obtaining. As a simple first step I removed the RWLock on EntryNotifier because it is unnecessary and re profiled. This removed the CPU overhead. I tested this change on MB with our geyser plugins and they are now stable on 7443p again.

Here is an example running yellowstone grpc plugin. This is from running a local cluster for about 10 minutes and shows he perf improvement. On MB the impact is much more severe, I didn't profile there though due to time/perf data size. Since a test of this change on MB fixed the issue I felt this was enough to illustrate the problem/solution.

With RWLock - 0.234% CPU time
Screenshot 2023-10-19 at 4 34 04 PM

Without RWLock - 0.147% CPU time
Screenshot 2023-10-19 at 4 28 38 PM

Summary of Changes

Remove RWLock from EntryNotifier

…hen entry notifications are enabled on geyser
@mergify mergify bot added community Community contribution need:merge-assist labels Oct 20, 2023
@mergify mergify bot requested a review from a team October 20, 2023 21:16
@vovkman vovkman changed the title Remove RWLock from EntryNotifier because it causes perf degradation w… Remove RWLock from EntryNotifier because it causes perf degradation Oct 20, 2023
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Oct 27, 2023
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Oct 27, 2023
@lijunwangs lijunwangs added the CI Pull Request is ready to enter CI label Oct 27, 2023
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Oct 27, 2023
@@ -8,4 +8,4 @@ pub trait EntryNotifier {
fn notify_entry(&self, slot: Slot, index: usize, entry: &EntrySummary);
}

pub type EntryNotifierLock = Arc<RwLock<dyn EntryNotifier + Sync + Send>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the unused RwLock to make sure clippy is happy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@vovkman
Copy link
Contributor Author

vovkman commented Nov 1, 2023

@lijunwangs do these remaining buildkite steps need to pass? They seem to be pulling in the wrong dependencies and failing bc of that

@CriesofCarrots
Copy link
Contributor

CriesofCarrots commented Nov 1, 2023

@lijunwangs do these remaining buildkite steps need to pass? They seem to be pulling in the wrong dependencies and failing bc of that

Those failures seem legitimate, and not related to dependencies:

Checking solana-geyser-plugin-manager v1.18.0 (/solana/geyser-plugin-manager)
error[E0277]: the trait bound `std::sync::RwLock<EntryNotifierImpl>: EntryNotifier` is not satisfied
   --> geyser-plugin-manager/src/geyser_plugin_service.rs:105:18
    |
105 |             Some(Arc::new(RwLock::new(entry_notifier)))
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `EntryNotifier` is not implemented for `std::sync::RwLock<EntryNotifierImpl>`
    = help: the trait `EntryNotifier` is implemented for `EntryNotifierImpl`
    = note: required for the cast from `Arc<std::sync::RwLock<EntryNotifierImpl>>` to `Arc<dyn EntryNotifier + std::marker::Send + Sync>`

For more information about this error, try `rustc --explain E0277`.
could not compile `solana-geyser-plugin-manager` (lib) due to previous error

Copy link

codecov bot commented Nov 1, 2023

Codecov Report

Merging #33797 (14e38fa) into master (63abc72) will increase coverage by 0.0%.
Report is 1 commits behind head on master.
The diff coverage is 0.0%.

@@           Coverage Diff           @@
##           master   #33797   +/-   ##
=======================================
  Coverage    81.9%    81.9%           
=======================================
  Files         809      809           
  Lines      218253   218252    -1     
=======================================
+ Hits       178771   178821   +50     
+ Misses      39482    39431   -51     

@vovkman
Copy link
Contributor Author

vovkman commented Nov 1, 2023

@lijunwangs do these remaining buildkite steps need to pass? They seem to be pulling in the wrong dependencies and failing bc of that

Those failures seem legitimate, and not related to dependencies:

Checking solana-geyser-plugin-manager v1.18.0 (/solana/geyser-plugin-manager)
error[E0277]: the trait bound `std::sync::RwLock<EntryNotifierImpl>: EntryNotifier` is not satisfied
   --> geyser-plugin-manager/src/geyser_plugin_service.rs:105:18
    |
105 |             Some(Arc::new(RwLock::new(entry_notifier)))
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `EntryNotifier` is not implemented for `std::sync::RwLock<EntryNotifierImpl>`
    = help: the trait `EntryNotifier` is implemented for `EntryNotifierImpl`
    = note: required for the cast from `Arc<std::sync::RwLock<EntryNotifierImpl>>` to `Arc<dyn EntryNotifier + std::marker::Send + Sync>`

For more information about this error, try `rustc --explain E0277`.
could not compile `solana-geyser-plugin-manager` (lib) due to previous error

Oh rip I must have missed a push. Updated, all passing now

Copy link
Contributor

@lijunwangs lijunwangs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@lijunwangs lijunwangs merged commit e840b97 into solana-labs:master Nov 6, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution need:merge-assist
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants